Arabic Native Language Identification

نویسندگان

  • Shervin Malmasi
  • Mark Dras
چکیده

In this paper we present the first application of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer’s first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Arabic Learner Corpus and with a combination of three syntactic features (CFG production rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our system achieves an accuracy of 41% against a baseline of 23%, providing the first evidence for classifier-based detection of language transfer effects in L2 Arabic. Such methods can be useful for studying language transfer, developing teaching materials tailored to students’ native language and forensic linguistics. Future directions are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perceptual confusions of American-English vowels and consonants by native Arabic bilinguals.

This study investigated the perception of American-English (AE) vowels and consonants by young adults who were either (a) early Arabic-English bilinguals whose native language was Arabic or (b) native speakers of the English dialects spoken in the United Arab Emirates (UAE), where both groups were studying. In a closed-set format, participants were asked to identify 12 AE vowels presented in /h...

متن کامل

Formulation of Language Teachers̕ Identity in the Situated Learning of Language Teaching Community of Practice

A community of practice may shape and reshape the identity of members of the community through providing them with situated learning or learning environment. This study, therefore, is to clarify the salient learning-based features of the language teaching community of practice that might formulate the identity of language teachers. To this end, the study examined how learning situations in two ...

متن کامل

Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech

We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Ara...

متن کامل

String Kernels for Native Language Identification: Insights from Behind the Curtains

The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using...

متن کامل

Patterns of Misperception of Arabic Consonants

There has been much investigation into perception of speech sounds, demonstrating a range of influences including listeners’ native language (e.g. Cutler et al. 2004), the sounds’ position in the syllable (e.g. Wang and Bilger 1973), and the presence of different types of masking noise (e.g. Phatak, Lovitt, and Allen 2008). However, there is no data on patterns of misperception of guttural cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014